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ABSTRACT 

Motivation: Genome-scale metabolic reconstructions summarize 
current knowledge about a target organism in a structured manner 
and as such highlight missing information. Such gaps can be filled 
algorithmically. Scalability limitations of available algorithms for gap 
filling hinder their application to compartmentalized reconstructions. 
Results: We present fastGapFill, a computationally efficient tractable 
extension to the COBRA toolbox that permits the identification of 
candidate missing knowledge from a universal biochemical reaction 
database (e.g. Kyoto Encyclopedia of Genes and Genomes) for a 
given (compartmentalized) metabolic reconstruction. The stoichiomet- 
ric consistency of the universal reaction database and of the metabolic 
reconstruction can be tested for permitting the computation of 
biologically more relevant solutions. We demonstrate the efficiency 
and scalability of fastGapFill on a range of metabolic reconstructions. 
Availability and implementation: fastGapFill is freely available from 
http://thielelab.eu. 
Contact: ines.thiele@uni.lu 

Supplementary information: Supplementary data are available at 
Bioinformatics online. 
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1 INTRODUCTION 

A biomolecular network reconstruction summarizes biochemical, 
physiological and genomic knowledge in a mathematically struc- 
tured electronic format (Palsson, 2006). It can be converted into 
a computational model, and predictions have been used to ac- 
celerate biotechnological and biomedical discoveries (Oberhardt 
et ah, 2010). The predictive capacity and accuracy of a model 
depend on the comprehensiveness and biochemical fidelity of the 
reconstruction, with respect to the underlying biochemistry. 
The comprehensiveness of a genome-scale metabolic reconstruc- 
tion can be improved by using the model to detect and fill 
network gaps (Rolfsson et al., 2011). Similarly, reconstruction 
fidelity can be improved by using the model to detect reconstruc- 
tion stoichiometry inconsistent with biochemistry (Gevorgyan 
et al., 2008) or reactions inconsistent with steady state flux 
(Vlassis et al, 2014). 

Existing gap-filling algorithms, reviewed by Orth and Palsson 
(2010), become intractable in high dimensions. 
Decompartmentalization of genome-scale compartmentalized 
metabolic networks reduces their dimension, rendering gap 
filling tractable (Rolfsson et al., 2011). However, this approach 
underestimates the amount of missing information because it 
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connects reactions that would normally not co-occur in the 
same cellular compartment. 

We present fastGapFill, the first scalable algorithm 
capable of efficiently detecting and filling network gaps in com- 
partmentalized genome-scale models. fastGapFill draws on, 
and extends, fastcore (Vlassis et al., 2014), an algorithm to ap- 
proximate the cardinality function to identify a compact 
flux consist ent model, in which all reactions carry a non-zero 
flux in at least one flux distribution. fastGapFill allows inte- 
grating all three notions of model consistency, namely, gap-fill- 
ing, flux consistency and stoichiometric consistency in a single 
tool. 

2 METHODS 

Formulation of the gap-filling problem. In the metabolic gap-filling prob- 
lem (Reed et al., 2006), one starts with a computational metabolic model, 
M, that contains at least one blocked reaction, which, though desired, 
does not admit a non-zero steady state flux. From a universal database, 
e.g. the Kyoto Encyclopedia of Genes and Genomes (KEGG) (Kanehisa 
and Goto, 2000), a search is made for at least one reaction that needs to 
be added to fill at least one gap in the model, such that at least one 
formerly blocked reaction can carry flux. Among other criteria, it may 
also be desirable to compute a compact flux consistent model, where the 
number of added universal reactions is minimal. A specific instance of 
this problem occurs in metabolic modeling, although our algorithm is 
applicable for any biochemical network model with gaps. 

Computing a compact flux consistent model. We repurposed the recently 
developed fastcore algorithm (Vlassis et al., 2014) to compute a near- 
minimal set of reactions that need to be added to an input metabolic 
model M to render it flux consistent, fastcore takes inputs M and a core 
set of reactions C C M. Then, it greedily expands C by computing a set of 
modes of M whose overall support contains the whole of C and a minimal 
set from M\C. This is achieved by a series of L r norm regularized linear 
programs that optimize a relaxed version of an (intractable) integer 
program under cardinality constrains (Vlassis et al, 2014). Our 
implementation efficiently identifies blocked reactions. 

Preprocessing to generate a global model. A cellularly compartmenta- 
lized metabolic model (S) without blocked reactions (B), where 
S U B = M, is expanded by a universal metabolic database U (e.g. 
KEGG), such that a copy of U is placed in each cellular compartment 
of S (including the extracellular space), to generate SU. For each metab- 
olite occurring in a non-cytosolic compartment, a reversible intercom- 
partmental transport reaction is added. For each extracellular 
metabolite, an exchange reaction is added. The sum of the latter two 
reaction sets (X) is added to SU to generate a global model, which is 
extended with solvable blocked reactions (B s c B), that is, reactions that 
were previously flux inconsistent but become flux consistent when added 
to the global model. In the extended global model (SUX), all reactions are 
flux consistent. Note that not all blocked reactions B may be solvable, 
and thus, they will not be present in SUX. All reactions of 5 and B s 
represent the core set. 
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Table 1. Gap filling of metabolic reconstructions on a standard desktop computer (Dell, Intel Core i5, 16 GB RAM, 64 bit) 



Model name 



Thermotoga maritime! 
(Zhang et al., 2009) 



Escherichia coli 
(Feist et al, 2007) 



Synechocystis sp. 
(Nogales et al, 2012) 



sIEC 

(Sahoo and Thiele, 2013) 



Recon 2 

(Thiele et al, 2013) 



5 a 

SUX* 
Comp b 
B 
B s 

Number of gap-filling reactions 

t preprocessing (^) 
tfastGapFUl (s) 



418 x 535 
14020x 31 566 
2 

116 

84 
87 
52 
21 



1501 x 2232 
21614x49 355 

3 

196 
159 
138 

237 
238 



632 x 731 

28 174x 62 866 

4 

132 
100 
172 
344 
435 



834 x 1260 

48 970 x 109 522 

7 

22 
17 
14 

1003 
194 



3187 x 5837 
58 672 x 132 622 
8 

1603 

490 

400 

5552 

1826 



il The dimensions are given as metabolites x reactions. 
b Comp, compartments. 

Preprocessing includes computing the flux consistent metabolic model, merging of UX for all compartments of S and adding solvable blocked reactions B s . 
Note: Equal weighting of all reactions was used. See Supplementary Table SI for candidate gap-filling solutions. 



Computing a compact flux consistent subnetwork of a global model. 

fastGapFill computes a subnetwork of SUX, consisting of all core 
reactions, plus a minimal number of reactions from UX, such that all 
reactions in the resulting compact subnetwork are flux consistent. This is 
achieved by using a slightly modified version of fastcore, in which a 
vector of linear weightings prioritizes the addition of reactions within 
UX. For instance, one may prioritize the addition of metabolic reactions 
from U over transport reactions from X, or, by varying the weightings on 
non-core reactions, alternate compact sets of gap-filling reactions may be 
identified. 

Optional analysis of gap-filling reactions. We provide the option to 
compute a flux vector that maximizes the flux through each blocked 
reaction in turn, while minimizing the Euclidean norm of flux through 
the subnetwork of SUX computed by one call to fastGapFill. Note that 
flux through more than one solvable blocked reaction may be necessary 
to fill a gap, and that the computed flux vector may not be of minimum 
cardinality. 

Computing stoichiometric consistency. Many reaction databases con- 
tain stoichiometric inconsistencies (Gevorgyan et al, 2008), where the 
stoichiometry for at least two reactions is inconsistent with conservation 
of mass. For instance, the reactions A ^ B and A f± B+C are stoichio- 
metrically inconsistent, as no positive molecular mass can be assigned to 
A, B and C, such that the mass on both sides of both reactions is equal. 
fastGapFill allows to identify stoichiometrically inconsistent reactions 
from filling gaps, by using the scalable approach for approximate cardin- 
ality maximization used within fastcore, to compute a maximal set of 
metabolites in U that are involved in reactions that conserve mass. 



3 IMPLEMENTATION 

An open source, MATLAB (Mathworks, Inc.), implementation 
of fastGapFill is available as a cross-platform desktop com- 
puter extension to the openCOBRA toolbox (Schellenberger 
et al., 2011). 

4 DISCUSSION 

We applied fastGapFill to five metabolic models (Table 1), 
demonstrating its broad applicability and scalability for various 
sizes of the gap-filling problem. Alternate gap-filling solutions 
can be computed by changing weightings on non-core reactions 
in the preprocessed problem. Note that all candidate metabolic 



and transport reactions are hypotheses requiring experimental 
validation (Rolfsson et ah, 2011). Our implementation provides 
an openCOBRA (Schellenberger et al., 2011) compatible version 
of the KEGG reaction database; however, any other universal 
reaction database could be used with fastGapFill, so long as the 
same input format is maintained and care is taken to correctly 
identify identical metabolites. fastGapFill is the first scalable 
approach to identify candidate missing knowledge in compart- 
mentalized metabolic reconstructions, and the approach is 
applicable to any form of biochemical network gap-filling 
problem. 
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